DFT Performance Prediction in FFTW
نویسندگان
چکیده
Fastest Fourier Transform in the West (FFTW) is an adaptive FFT library that generates highly efficient Discrete Fourier Transform (DFT) implementations. It is one of the fastest FFT libraries available and it outperforms many adaptive or hand-tuned DFT libraries. Its success largely relies on the huge search space spanned by several FFT algorithms and a set of compiler generated C code (called codelets) for small size DFTs. FFTW empirically finds the best algorithm by measuring the performance of different algorithm combinations. Although the empirical search works very well for FFTW, the search process does not explain why the best plan found performs best, and the search overhead grows polynomially as the DFT size increases. The opposite of empirical search is model-driven optimization. However, it is widely believed that model-driven optimization is inferior to empirical search and is particularly powerless to solve problems as complex as the optimization of DFT. In this paper, we propose a model-driven DFT performance predictor that can replace the empirical search engine in FFTW. Our technique adapts to different architectures and automatically predicts the performance of DFT algorithms and codelets (including SIMD codelets). Our experiments show that this technique renders DFT implementations that achieve more than 95% of the performance with the original FFTW and uses less than 5% of the search overhead on four test platforms. More importantly, our models give insight on why different combinations of DFT algorithms perform differently on a processor given its architectural features.
منابع مشابه
A Search Optimization in Fftw
Generating high performance fast Fourier transform(FFT) libraries for different computer architectures is an important task. Architecture vendors sometimes have to rely on dedicated experts to tune FFT implementation on each new platform. Fastest Fourier transform in the West(FFTW) replaces this tedious and repeated work with an adaptive FFT library. It automatically generates FFT code that are...
متن کاملThe Fastest Fourier Transform in the West (MIT-LCS-TR-728)
This paper describes FFTW, a portable C package for computing the oneand multidimensional complex discrete Fourier transform (DFT). FFTW is typically faster than all other publicly available DFT software, including the well-known FFTPACK and the code from Numerical Recipes. More interestingly, FFTW is competitive with or better than proprietary, highly-tuned codes such as Sun’s Performance Libr...
متن کاملThe Design and Implementation of FFTW 3 Matteo
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with handoptimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for real-data DFTs o...
متن کاملAutotuning of Fftw Library for Massively Parallel Supercomputers Scalability Improvements for Dft Codes Due to the Implementation of the 2d Domain Decomposition Algorithm
In this paper we will present part of the work carried out by CINECA in the framework of the PRACE-2IP project aimed to study the effect on performance due to the implementation of a 2D Domain Decomposition algorithm in DFT codes that use standard 1D (or slab) Parallel Domain Decomposition. The performance of this new algorithm are tested on two example applications: Quantum Espresso, a popular...
متن کاملAutomatic Generation of Vectorized Fast Fourier Transform Libraries for the Larrabee and AVX Instruction Set Extension
Introduction The discrete Fourier transform (DFT) and its fast algorithms (fast Fourier transforms or FFTs) are among the most important computational building blocks in signal processing and scientific computing. Consequently, there is a number of high performance DFT libraries available including Intel’s Integrated Performance Primitives (IPP), FFTW [6], and libraries generated by Spiral [9, ...
متن کامل